Dynamical Sources in Information Theory: a General Analysis of Trie Structures Dynamical Sources in Information Theory: a General Analysis of Trie Structures Dynamical Sources in Information Theory: a General Analysis of Trie Structures
نویسندگان
چکیده
Digital trees, also known as tries, are a general purpose exible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries ((ternary search triess). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of nite Markov chains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of speciic transfer operators of the Ruelle type. Sources dynamiques en thhorie de l'information: une analyse ggnnrale des arbres digitaux RRsumm : Les arbres digitaux, galement connus sous le nom de triess sont une structure de donne ggnnrique et exible qui permet d'implanter des dictionnaires construits sur des ensembles de mots. Nous donnons une analyse de troies reprrsentations principales de ces arbres, les arbres-tableaux, les arbres-listes, et les arbres ternaires de recherche. La taille et les coots de recherche de ces reprrsentations sont analysss prrcissment en moyenne, tandis qu'une analyse en distribution de la hauteur est obtenue. Le moddle uniicateur d'analyse est celui des sources dynamiquess, lesquelles recouvrent les moddles classiques comme les sources sans mmmoire ((symboles inddpendants), les chaines de Markov nies, et les densitts initiales non uniformes. Les propriitts probabilistes des principaux parammtres de taille, longueur de cheminement et hauteur apparaissent liies deux caracttristiques fondamentales de la source: l'entropie et la probabilitt de coincidence. Ces caracttristiques se trouvent elle-mmmems reliies aux propriitts spectrales d'oprateurs de transfert du type introduit par Ruelle. Abstract. Digital trees, also known as tries, are a general purpose exible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries ((ternary search triess). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of height of tries is given. The unifying data model used is that of dynamical …
منابع مشابه
Smoothed Analysis of Trie Height
Tries are very simple general purpose data structures for information retrieval. A crucial parameter of a trie is its height. In the worst case the height is unbounded when the trie is built over a set of n strings. Analytical investigations have shown that the average height under many random sources is logarithmic in n. Experimental studies of trie height suggest that this holds for non-rando...
متن کاملInformation theory: Sources, Dirichlet series, and realistic analyses of data structures
Most of the text algorithms build data structures on words, mainly trees, as digital trees (tries) or binary search trees (bst). The mechanism which produces symbols of the words (one symbol at each unit time) is called a source, in information theory contexts. The probabilistic behaviour of the trees built on words emitted by the same source depends on two factors: the algorithmic properties o...
متن کاملDynamical Sources in Information Theory : A General Analysis of Trie StructuresJulien Clément
Digital trees, also known as tries, are a general purpose exible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries ((ternary search triess). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a comp...
متن کاملTowards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model
Tries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., LempelZiv’77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election a...
متن کاملDesigning the Model of Information Anorexia Among Medical Students in the Hamedan University of Medical Sciences Using Grounded Theory
Objective People with information anorexia severely limit the acquisition and use of information and lose the opportunity to receive new information, and often rely on a few limited sources of information. This study aims to investigate the information anorexia of medical students in Hamedan University of Medical Sciences (HUMS). Methods The is qualitative study based on grounded theory conduc...
متن کامل